-
Notifications
You must be signed in to change notification settings - Fork 624
upgrade to vllm 0.11.2 #4400
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
upgrade to vllm 0.11.2 #4400
Conversation
|
👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:
If CI fails, you can run linting and testing checks locally according Contributing and Testing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request upgrades the vLLM dependency to version v0.11.2 and adapts the codebase to the corresponding upstream changes. The modifications are extensive and touch upon various components, including Dockerfiles, documentation, tests, and core implementation files.
Key changes include:
- Updating the
VLLM_TAGin all Dockerfiles tov0.11.2and optimizing the git clone process. - A significant architectural refactoring in the model runner, which splits the model execution logic into two distinct steps:
execute_modelfor the forward pass and a newsample_tokensmethod for token sampling and applying grammar constraints. This change is consistently propagated through the worker and model runner implementations. - Adapting to API changes in
SchedulerOutputby removing grammar-related fields, which are now handled in the newsample_tokensstep. - Updating the custom attention backend registration to align with vLLM's new decorator-based system.
- Refactoring model implementations such as
Qwen2.5-VLandQwen3-Nextto align with upstream method renames and simplifications, including the removal of custom workarounds that are no longer necessary. - Enhancements to the
MultiprocExecutorto better support multi-node distributed execution.
The changes are well-integrated and appear to correctly adapt the project to the new vLLM version. I have not identified any critical or high-severity issues in this pull request.
|
@leo-pony |
This comment was marked as resolved.
This comment was marked as resolved.
3c9d947 to
eeed2ce
Compare
|
|
||
| visible_device_count = (torch.npu.device_count() | ||
| if torch.npu.is_available() else 0) | ||
| assert self.parallel_config.local_world_size <= visible_device_count, ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ray error
4ab8bf0 to
efe41e8
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
57a93eb to
588ed71
Compare
|
This pull request has conflicts, please resolve those before we can evaluate the pull request. |
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: hfadzxy <[email protected]>
Signed-off-by: leo-pony <[email protected]>
…error Signed-off-by: leo-pony <[email protected]>
Signed-off-by: MengqingCao <[email protected]>
Signed-off-by: leo-pony <[email protected]>
Signed-off-by: wangxiyuan <[email protected]>
54d0895 to
8fcc2c9
Compare
Signed-off-by: hfadzxy <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's address the known issues in the follow-up prs
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Signed-off-by: Kurumi5210 <[email protected]>
Bump vLLM version to v0.11.2 What's broken and changed by vLLM: 1. structured_output is broken by vllm-project/vllm#26866 2. get_mrope_input_positions is broken by vllm-project/vllm#28399 3. graph mode is broken by vllm-project/vllm#25110 we'll upgrade torch to 2.8 to fix the problem later 4. embedding is broken by vllm-project/vllm#27583 5. `get_attn_backend_cls` and attention backend is broken are broken by vllm-project/vllm#28534 6. spec decode is broken by vllm-project/vllm#28771 7. sp feature is broken by vllm-project/vllm#27126 8. mtp is broken by vllm-project/vllm#27922 9. lora is broken by vllm-project/vllm#21068 10. execute_model is broken by vllm-project/vllm#26866 11. `VLLM_DISABLE_SHARED_EXPERTS_STREAM` env is broken by vllm-project/vllm#28159 12. kv cahe is broken by vllm-project/vllm#27753 13. dp is broken by vllm-project/vllm#25110 What's broken and changed by ourself: 1. qwen vl is broken by vllm-project/vllm#28455 We'll remove model files in the future to avoid this kind of error 2. Engine core is broken by vllm-project/vllm#23691 We'll remove the patch file in the future. 3. Ascend scheduler is broken by vllm-project/vllm#28733 We'll remove ascend scheudler later. 4. qwen3-next is broken by vllm-project/vllm#28083 We'll remove model files in the future to avoid this kind of error 5. qwen vl is broken by vllm-project/vllm#27764. We'll remove model files in the future Known issue: 1. ray doesn't work 2. the accuracy of qwen3-next is not correct 3. qwen3-vl is broken 4. prefix cache+ ascend scheduler + deepseek v2 lite is broken. Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]> Co-authored-by: 22dimensions <[email protected]> Co-authored-by: shen-shanshan <[email protected]> - vLLM version: v0.11.2 --------- Signed-off-by: wangxiyuan <[email protected]> Signed-off-by: MengqingCao <[email protected]> Signed-off-by: hfadzxy <[email protected]> Signed-off-by: leo-pony <[email protected]> Co-authored-by: MengqingCao <[email protected]> Co-authored-by: hfadzxy <[email protected]> Co-authored-by: leo-pony <[email protected]>
Bump vLLM version to v0.11.2
What's broken and changed by vLLM:
mm_featuresdirectly intoget_mrope_input_positionsvllm#28399get_attn_backend_clsand attention backend is broken are broken by [CI Failure] Fix backend selection for encoder-only models vllm#28534VLLM_DISABLE_SHARED_EXPERTS_STREAMenv is broken by [Bug] Fix env string"0"same toTruevllm#28159What's broken and changed by ourself:
SchedulerConfig.max_model_leninit-only vllm#28733 We'll remove ascend scheudler later.Known issue:
Co-authored-by: MengqingCao [email protected]
Co-authored-by: hfadzxy [email protected]
Co-authored-by: leo-pony [email protected]
Co-authored-by: 22dimensions [email protected]
Co-authored-by: shen-shanshan [email protected]